DFSS

Table of Content:

Formular Collection

Basic Concepts and Set Operations – Calculation Rules for Set Operations

Rule Arithmetic Operation
Commutative rule \[ A \cup B = B \cup A \]
\[ A \cap B = B \cap A \]
Associative rule \[ (A \cup B) \cup C = A \cup (B \cup C) \]
\[ (A \cap B) \cap C = A \cap (B \cap C) \]
Distributive rule \[ (A \cup B) \cap C = (A \cap C) \cup (B \cap C) \]
\[ (A \cap B) \cup C = (A \cup C) \cap (B \cup C) \]
De Morgan's rules \[ (A \cup B)' = A' \cap B' \]
\[ (A \cap B)' = A' \cup B' \]

Expactation Values of Random Variables

Arithmetic Operation Behaviour of the Expectation Value
Expectation Value of a Constant \[ E(k) = k \]
Linearity \[ E(a \cdot h(x) + b \cdot g(x)) = a \cdot E(h(x)) + b \cdot E(g(x)) \]

Theorems on Probability According to Kolmogorov

Rule Calculation Rule
Addition Theorem \[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]
Addition Theorem Of Complementary Events \[ P(A) + P(A') = 1 \]
Addition Theorem Of Independent Events \[ P(A_1 \cup ... \cup A_n) = P(A_1) + ... + P(A_n) \]
Conditional Probability \[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]
Multiplication Set \[ P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A) \]
Statistical Independence \[ P(A|B) = P(A) \] \[ P(B|A) = P(B) \] \[ P(A \cap B) = P(A) \cdot P(B) \]
Total Probability \[ P(B) = \sum_{n = 1}^{N} P(B|A_n) \cdot P(A_n) \]
Bayes' Theorem \[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} = \frac{P(B|A) \cdot P(A)}{\sum_{n = 1}^{N} P(B|A_n) \cdot P(A_n)} \]

Estimation of unknown parameters

γ=1α\gamma = 1 - \alpha

c1=F1(1γ2)=F1(α2)c_1 = F^{-1}(\frac{1 - \gamma}{2}) = F^{-1}(\frac{\alpha}{2})

c2=F1(1+γ2)=F1(1α2)c_2 = F^{-1}(\frac{1 + \gamma}{2}) = F^{-1}(1 - \frac{\alpha}{2})

  Distribution of the Estimator Prediction Range
Mean value \( \mu \) with known variance [Python Code] Standard normal distribution \[z = \frac{\bar{x} - \mu}{\sigma / \sqrt{N}}\] \[\bar{x} - \frac{c_2 \cdot \sigma}{\sqrt{N}} \leq \mu \leq \bar{x} - \frac{c_1 \cdot \sigma}{\sqrt{N}}\]
Variance \( \sigma^2 \)
[Python Code]
Chi-square distribution with \( N - 1 \) degrees of freedom \[\chi = (N - 1) \cdot \frac{s^2}{\sigma^2}\] \[\frac{s^2 \cdot (N - 1)}{c_2} \leq \sigma^2 \leq \frac{s^2 \cdot (N - 1)}{c_1}\]
Mean value \( \mu \) with unknown variance
[Python Code]
t distribution with \( N - 1 \) degrees of freedom \[t = \frac{\bar{x} - \mu}{s / \sqrt{N}}\] \[\bar{x} - \frac{c_2 \cdot s}{\sqrt{N}} \leq \mu \leq \bar{x} - \frac{c_1 \cdot s}{\sqrt{N}}\]
Difference of two mean values \( \mu_1 - \mu_2 \) with known variance
[Python Code]
Standard normal distribution \[z = \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\sigma^2 \cdot (\frac{1}{N} + \frac{1}{M})}}\] \[\mu_{c1} < (\mu_1 - \mu_2) \leq \mu_{c2}\] with \[\mu_{c1} = (\bar{x_1} - \bar{x_2}) - c_2 \cdot \sqrt{\frac{1}{N} + \frac{1}{M}} \cdot \sigma\] and \[\mu_{c2} = (\bar{x_1} - \bar{x_2}) - c_1 \cdot \sqrt{\frac{1}{N} + \frac{1}{M}} \cdot \sigma\]
Difference of two mean values \( \mu_1 - \mu_2 \) with unknown variance
[Python Code]
t distribution with \( N + M - 2 \) degrees of freedom \[t = \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\frac{1}{N} + \frac{1}{M} }\cdot s}\] with \[s = \sqrt{\frac{s_N^2 \cdot (N-1) + s_M^2 \cdot (M-1)}{N + M - 2}}\] \[\mu_{c1} < (\mu_1 - \mu_2) \leq \mu_{c2}\] with \[\mu_{c1} = (\bar{x_1} - \bar{x_2}) - c_2 \cdot \sqrt{\frac{1}{N} + \frac{1}{M}} \cdot s\] and \[\mu_{c2} = (\bar{x_1} - \bar{x_2}) - c_1 \cdot \sqrt{\frac{1}{N} + \frac{1}{M}} \cdot s\]
Ratio of two sample variances
[Python Code]
f-distribution with \( (N - 1, M - 1) \) degrees of freedom \[f = \frac{s_1^2}{s_2^2} \cdot \frac{\sigma_2^2}{\sigma_1^2}\] \[\frac{s_2^2}{s_1^2} \cdot c_1 < \frac{\sigma_2^2}{\sigma_1^2} \leq \frac{s_2^2}{s_1^2} \cdot c_2\]

Prediction ranged of populations

  Distribution of the Estimator Prediction Range
Known mean \( \mu \) known variance \( \sigma^2 \) standard normal distribution \[ z = \frac{x - \mu}{\sigma} \] \[ \mu + c_1 \cdot \sigma \leq x \leq \mu + c_2 \cdot \sigma \]
Unknown mean \( \mu \) known variance \( \sigma^2 \) Standard normal distribution \[ z = \frac{x - \bar{x}}{\sigma \cdot \sqrt{1 + \frac{1}{N}}} \] \[ \bar{x} + c_1 \cdot \sigma \cdot \sqrt{1 + \frac{1}{N}} < x \leq \bar{x} + c_2 \cdot \sigma \cdot \sqrt{1 + \frac{1}{N}} \]
Known mean \( \mu \) unknown variance \( \sigma^2 \) t-distribution with \( N-1 \) degrees of freedom \[ t = \frac{x - \mu}{s} \] \[ \mu + c_1 \cdot s < x \leq \mu + c_2 \cdot s \]
Unknown mean \( \mu \) unknown variance \( \sigma^2 \) t-distribution with \( N-1 \) degrees of freedom \[ t = \frac{x - \bar{x}}{s \cdot \sqrt{1 + \frac{1}{N}}} \] \[ \bar{x} + c_1 \cdot s \cdot \sqrt{1 + \frac{1}{N}} < x \leq \bar{x} + c_2 \cdot s \cdot \sqrt{1 + \frac{1}{N}} \]

Hyperthesis Testing for the parameters of a normal distribution

First and Second Type Errors

Parameter Function Hypothesis testing Distribution Comments
Mean value with known variance [Python Code] Testing from a sample with sample size N of a normally distributed random variable with known variance for a mean value µ0 One-sample z-test Normal distribution Test can be defined one-sided or two-sided
Variance Testing from a sample with sample size N of a normally distributed random variable for $\sigma_0$ One-Sample Chi2-Test Chi2-distribution with N - 1 degrees of freedom
Mean value with unknown variance Testing of a sample with sample size N of a normally distributed random variable with unknown variance to a mean value µ0 One-sample-t-test t distribution with N - 1 degrees of freedom Test can be defined one-sided or two-sided
Difference in mean values for paired samples Difference in mean values for paired samples One-sample-z-test Normal distribution Test can be defined one-sided or two-sided
Difference of the mean values with known variance Testing two samples with different sample sizes of a normally distributed random variable with known variance for the difference in means µ0 Two-Sample-z-test Normal distribution Test can be defined one-sided or two-sided

Check if data is normally distributed

Kolmogorov-Smirnov Test

Can be used as a test for normal distribution if parameters of the distribution are known

dc=ln(2α)2Nd_c = \sqrt{\frac{ln(\frac{2}{\alpha})}{2 \cdot N}}

Anderson-Darling Test

Checks the symmetry of a distribution and compares it with the symmetry of a normal distribution

A=N1Nn=1N(2n1)(ln(F(xn))+ln(1F(XNn+1)))A = - N - \frac{1}{N} \cdot \sum_{n = 1}^{N} (2 \cdot n - 1) \cdot (ln(F(x_n)) + ln(1 - F(X_{N-n+1})))

AD=A(1+34N+98N2)A_D = A \cdot (1 + \frac{3}{4 \cdot N} + \frac{9}{8 \cdot N^2})

Shapiro-Wilk Test

Evaluates the quotient of the expected variance in a normal distribution and variance present

b=k=1Kak(xNk+1xk)b = \sum_{k=1}^{K} a_k \cdot (x_{N-k+1} - x_k)

W=b2(N1)s2W = \frac{b^2}{(N - 1) \cdot s^2}

Python-CheatSheet

Standard Deviation

import numpy as np
x_std = np.std(a=x, ddof=1)

np.std parameter explanation:

parameter description
a is the input array (can be a list, a NumPy array, or any iterable object)
ddof Delta Degrees of Freedom

Confidence Interval (known σ & μ)

from scipy.stats import norm  # normal-Verteilung
import numpy as np
import matplotlib.pyplot as plt

alpha = 0.05
N = 5
M = 10
mu = 3
sig = 0.5

OEG = norm.ppf(q=1-alpha/2, loc=mu, scale=sig/np.sqrt(N))
OEG = mu + norm.ppf(q=1-alpha/2, loc=0,scale=1 ) * (sig/np.sqrt(N))
# this two functions are the same:
# once scaled in the function
# once scaled outside
UEG = norm.ppf(q=alpha/2, loc=mu, scale=sig/np.sqrt(N))
print(OEG, UEG)

norm.ppf parameter explanation:

parameter description
q This is the probability for which you want to find the critical value (value must be between 0 and 1)
df This is the degrees of freedom
loc (optional) The location parameter (mean) of the distribution. The default value is 0
scale (optional) The scale parameter (standard deviation) of the distribution. The default value is 1.

Confidence Interval for μ (unknown σ & μ)

import numpy as np
from scipy.stats import t
from math import sqrt

# MEAN VALUE CALCULATION:
x_mean = np.mean(x)
print("x_mean:", x_mean)

# STD.DEVIATION CALCULATION:
x_std = np.std(x, ddof=1) # it is the x_std
print("x_std", x_std)

# CONFIDENCE INTERVAL CALCULATION FOR X_MEAN:
c1_t = t.ppf(q=(apha/2), df=N-1)
c2_t = t.ppf(q=(1-apha/2), df=N-1)
mu_c1 = x_mean - ((c2_t _ x_std) / sqrt(N))
mu_c2 = x_mean - ((c1_t _ x_std) / sqrt(N))

t.ppf parameter explanation:

parameter description
q This is the probability for which you want to find the critical value (value must be between 0 and 1)
df This is the degrees of freedom
loc (optional) The location parameter (mean) of the distribution. The default value is 0
scale (optional) The scale parameter (standard deviation) of the distribution. The default value is 1.

Confidence Interval for σ (unknown σ & μ)

# MEAN VALUE CALCULATION:
x_mean = np.mean(x)
print("x_mean:", x_mean)

# STD.DEVIATION CALCULATION:
x_std = np.std(x, ddof=1)  # it is the x_std
print("x_std", x_std)

# CONFIDENCE INTERVAL CALCULATION FOR SIGMA:
c1_chi = chi2.ppf(apha/2, df=N-1)
c2_chi = chi2.ppf((1-apha)/2, df=N-1)
sigma1 = (x_std*x_std * (N-1)) / c2_chi
sigma2 = (x_std*x_std * (N-1)) / c1_chi

chi.ppf parameter explanation:

Parameter Description
q This is the probability for which you want to find the critical value (value must be between 0 and 1)
df This is the degrees of freedom
loc (optional) The location parameter (mean) of the distribution. The default value is 0
scale (optional) The scale parameter (standard deviation) of the distribution. The default value is 1.

Confidence Interval for σ (unknown σ & μ)


Difference of two mean values μ1 - μ2 with known variance

import numpy as np
from scipy.stats import t, f

# MEAN VALUE CALCULATION:
x1_mean = np.mean(x1)
x2_mean = np.mean(x2)

Difference of two mean values μ1 - μ2 with unknown variance

import numpy as np
from scipy.stats import t, f

# MEAN VALUE CALCULATION:
x1_mean = np.mean(x1)
x2_mean = np.mean(x2)

# STD.DEVIATION CALCULATION:
x1_std = np.std(x1, ddof=1)
x2_std = np.std(x2, ddof=1)
x_std = np.sqrt(((x1_std**2)*(N-1)+(x2_std**2)*(M-1))/(N+M-2))

c1_t = t.ppf(q=(apha/2), df=N+M-2)
c2_t = t.ppf(q=(1-apha/2), df=N+M-2)

mu_c1 = (x1_mean - x2_mean) - c2_t * sqrt((1/N) + (1/M)) * x_std
mu_c2 = (x1_mean - x2_mean) - c1_t * sqrt((1/N) + (1/M)) * x_std

Ratio of two sample variances

import numpy as np
from scipy.stats import t, f
from math import sqrt

# STD.DEVIATION CALCULATION:
x1_std = np.std(x1, ddof=1)
x2_std = np.std(x2, ddof=1)

# CONFIDENCE INTERVAL CALCULATION FOR RATIO OF VARIANCES
c1_f = f.ppf((1-gamma)/2, N-1, M-1)
c2_f = f.ppf((1+gamma)/2, N-1, M-1)
s_ratio1 = (x2_std**2)/(x1_std**2) * c1_f
s_ratio2 = (x2_std**2)/(x1_std**2) * c2_f

print(f"{s_ratio1} < sigma_ratio <= {s_ratio2}")

plotting histogram with Confidence Interval

import matplotlib.pyplot as plt

# PLOT SETTINGS FOR HISTOGRAM:
plt.hist(x=x, density=True, edgecolor="black", label="Sample")
plt.axvline(mu_c1, color="red", label="$\mu_{min}$")
plt.axvline(mu_c2, color="yellow", label="$\mu_{max}$")
plt.xlabel("xLabel")
plt.ylabel("yLabel")
plt.legend(loc="upper left")

# CONFIDENCE INTERVAL PLOT:
x_axis = np.linspace(310, 350, 1000)
plt.plot(x_axis, norm.pdf(x_axis, loc=x_mean, scale=x_std),
         label="estimated normal\ndistribution")

# PLOT COMMAND:
plt.show()

plt.hist parameter explanation:

Parameter Description
x This is the probability for which you want to find the critical value (value must be between 0 and 1)
density (optional) If set to True, the histogram will display the probability density instead of raw counts. This is useful for comparing distributions of datasets with different sizes.
edgecolor (optional) You can set the color of the edges of the histogram bars.
label (optional) The label text in the legend

Mean value with known variance

from scipy.stats import norm  # normal-Verteilung
import numpy as np
import matplotlib.pyplot as plt

alpha = 0.05
N = 5
M = 10
mu = 3
sig = 0.5

z_top = norm.ppf(q=1-alpha/2, loc=mu, scale=sig/np.sqrt(N))
z_top = mu + norm.ppf(q=1-alpha/2, loc=0,scale=1 ) * (sig/np.sqrt(N))
# this two functions are the same:
# once scaled in the function
# once scaled outside
z_bottem = norm.ppf(q=alpha/2, loc=mu, scale=sig/np.sqrt(N))
print(z_top, z_bottem)

dmu = np.arange(-2, 2.001, 0.001)
quality_function_left = norm.cdf(x=(z_bottem-(mu+dmu))/(sig/np.sqrt(N)))
quality_function_left = norm.cdf(x=(z_bottem), loc=(mu + dmu), scale=(sig/np.sqrt(N)))
# this two functions are the same:
# once scaled in the function via properties
# once scaled in the formular itself
quality_function_rigth = 1 - norm.cdf(x=(z_top), loc=(mu + dmu), scale=(sig/np.sqrt(N)))
quality_function = quality_function_left + quality_function_rigth
#qualit function: Retruns a properbility if h1 is the correct hyperthesis

Python Code Cubic Model with variables

import numpy as np
import pandas as pd
from statsmodels.formula.api import ols
import statsmodels.api as sm
import matplotlib.pyplot as plt
import matplotlib
import sys

diodeCurrent = pd.read_csv('diodecurrent.csv', sep=',')

u_diodeCurrent = diodeCurrent['U']
i_diodeCurrent = diodeCurrent['I']

modelDataDictionary = {'intercept': i_diodeCurrent,
                       'u_diodeCurrent': u_diodeCurrent,
                       'u_diodeCurrent_square': u_diodeCurrent**2,
                       'u_diodeCurrent_cube': u_diodeCurrent**3}

model1 = ols("intercept ~ u_diodeCurrent + u_diodeCurrent_square + u_diodeCurrent_cube",
             modelDataDictionary).fit()
print(model1.summary())

Prediction Range of Regression Function

This Code is needed above! Python Code Cubic Model with variables

# Get the predicted values and residuals
residuals = model1._results.resid

# Calculate the standard error of the residuals
residual_std_error = np.sqrt(np.sum(residuals**2) / (len(u_diode) - 4))
#! -> different way: residual_std_error = np.sqrt(model1._results.mse_resid)

# Set the confidence level (e.g., 0.99 for a 99% prediction interval)
confidence_level = 0.99

# Calculate the t-value for the given confidence level
t_value = t.ppf((1 + confidence_level) / 2, df=len(u_diode) - 4)

# Generate u_diode values for predictions
ud_pred = np.linspace(u_diode.min(), u_diode.max(), 100)
X_pred = pd.DataFrame({
    'intercept': np.ones_like(ud_pred),
    'u_diode': ud_pred,
    'u_diodeCurrent_square': ud_pred**2,
    'u_diodeCurrent_cube': ud_pred**3
})

# Calculate predictions and prediction interval
id_pred_new = model1.predict(X_pred)

upper_bound = id_pred_new + t_value * residual_std_error
lower_bound = id_pred_new - t_value * residual_std_error

# Plot the original data points
plt.scatter(u_diode, i_diode, label='Original Data')

# Plot the cubic regression curve
plt.plot(ud_pred, id_pred_new, 'r', label='Cubic Regression')

# Plot the prediction interval
plt.fill_between(ud_pred, lower_bound, upper_bound, alpha=0.3,
                 label=f'{confidence_level*100}% Prediction Interval')

plt.title('Cubic Regression with Prediction Interval')
plt.xlabel('u_diode')
plt.ylabel('i_diode')
plt.legend()
plt.show()